Automatic Identification of Lifestyle and Environmental Factors from Social History in Clinical Text

نویسندگان

  • Meliha Yetisgen-Yildiz
  • Elena Pellicer
  • David R. Crosslin
  • Lucy Vanderwende
چکیده

Lifestyle and environmental factors play a significant role both in clinical research as well as clinical care. In clinical research, it has been established that 5-10% of cancers can be attributed to hereditary factors, while 90-95% have been found correlated with lifestyle and environmental factors such as smoking, diet and exercise. For clinical care, it has long been practice to record social history during clinical care as this history impacts not only diagnosis but also treatment options. We therefore propose in this work to automatically identify those lifestyle and environmental factors that clinical caregivers have documented. We extended Milton et. al.’s analysis of social and behavior information and Uzuner et. al.’s information on smoking in discharge summaries. Dataset We created a corpus from MTSamples website (http://www.mtsamples.com/). The website provides a large collection of publicly available transcribed medical records. We scraped 516 history and physical notes since these reports contain very rich social history information. We applied our in-house statistical section chunker (http://depts.washington.edu/bionlp/index.html?software) and identified 342 sections tagged as social history in 516 reports for annotation. Annotation Process We created a detailed annotation guideline to annotate the following lifestyle and environment factors: (1) substance abuse (smoking, alcohol and drug use), (2) occupation, (3) marital status, (4) family information, (5) residence, (6) living situation, (7) environmental exposures, (8) physical activity, (9) weight management, (10) sexual history, and (11) infectious disease history. We then defined 9 different dimensions that might apply to each type of factor; i.a., for substance abuse (1), annotations are made regarding status (possible values: past, current, none, unknown), time frame (e.g. since 2010), method (e.g. drink, inhale, inject), type (e.g. cigarettes, wine, cocaine), amount (e.g. # of cigrettes|drinks), frequency (e.g. daily, socially, rarely), and history (e.g. after 10 years of smoking), while for occupation (2), location and extent (e.g. part-time, night-shift) dimensions are annotated. Using the BRAT rapid annotation tool, two annotators each annotated 20 social history sections. In the first round, inter-rater agreement was 0.59 F1 for the 11 lifestyle and environmental factors and their 9 dimensions. The annotators met and resolved all the conflicts, and the annotation guideline was updated. A single annotator is in the process of annotating the rest of the dataset. Annotation of 120 social history sections has been completed. Conclusion The social history section in clinical text indeed contains a wealth of information regarding a patient's lifestyle and environmental factors, which can be used in both clinical care and in clinical research. We are in the process of building automated extractors based on the annotated set. We will release both the annotated corpus and the extractors to the research community. Our research goal is to apply these extractors to EMRs to facilitate robust correlation studies between these factors and disease outcomes. Acknowledgements This work was supported by University of Washington Institute of Translational Health Sciences UL1TR000423.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Identification and Prioritization of Effective Factors on Social Health of the Elderly Using Dematel & ISM Methods

Given the growing number of elderly people and the fact that elderly people are more vulnerable to social harm on the eve of aging, such as loneliness, depression, etc., therefore, it is important to promote their social health indicators as factors influencing social development. The factors affecting social health seem to be the product of various factors interacting. The purpose of this stud...

متن کامل

Identifying the Islamic Lifestyle Factors in Business from Perspective of Some Verses and Hadiths

Although various indicators of Islamic lifestyle have been explicated in various verses and hadiths, the lack of scrutinizing and fathoming of Islamic Intellectuals in this regard and also the lack of compatibility with today’s lifestyle has made the factors of Islamic lifestyle to be faded away in the contemporary society. Reflecting and contemplating on the Islamic verses and hadiths (a relig...

متن کامل

Using Machine Learning Algorithms for Automatic Cyber Bullying Detection in Arabic Social Media

Social media allows people interact to express their thoughts or feelings about different subjects. However, some of users may write offensive twits to other via social media which known as cyber bullying. Successful prevention depends on automatically detecting malicious messages. Automatic detection of bullying in the text of social media by analyzing the text "twits" via one of the machine l...

متن کامل

Identifying the factors influencing the re-admission of hospitalized patients in the internal wards of educational hospitals: a qualitative study

Background: Currently many hospitals around the country face increasing demands of their patients and readmission.The rate of readmission is a useful indicator for determining the performance of healthcare system and it shows the quality of services in the medical institutions. Readmissions have high economic, social and financial impact and studying the related factors seems to be high priorit...

متن کامل

Author gender identification from text using Bayesian Random Forest

Nowadays high usage of users from virtual environments and their connection via social networks like Facebook, Instagram, and Twitter shows the necessity of finding out shared subjects in this environment more than before. There are several applications that benefit from reliable methods for inferring age and gender of users in social media. Such applications exist across a wide area of fields,...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016